AAAI.2018 - NLP and Knowledge Representation

Total: 27

#1 Syntax-Directed Attention for Neural Machine Translation [PDF] [Copy] [Kimi]

Authors: Kehai Chen ; Rui Wang ; Masao Utiyama ; Eiichiro Sumita ; Tiejun Zhao

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the aligned source position and neglect syntax distance constraints. In this paper, we extend the local attention with syntax-distance constraint, which focuses on syntactically related source words with the predicted target word to learning a more effective context vector for predicting translation. Moreover, we further propose a double context NMT architecture, which consists of a global context vector and a syntax-directed context vector from the global attention, to provide more translation performance for NMT from source representation. The experiments on the large-scale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system.

#2 A Semantic QA-Based Approach for Text Summarization Evaluation [PDF] [Copy] [Kimi]

Authors: Ping Chen ; Fei Wu ; Tong Wang ; Wei Ding

Many Natural Language Processing and Computational Linguistics applications involve the generation of new texts based on some existing texts, such as summarization, text simplification and machine translation. However, there has been a serious problem haunting these applications for decades, that is, how to automatically and accurately assess quality of these applications. In this paper, we will present some preliminary results on one especially useful and challenging problem in NLP system evaluation---how to pinpoint content differences of two text passages (especially for large passages such as articles and books). Our idea is intuitive and very different from existing approaches. We treat one text passage as a small knowledge base, and ask it a large number of questions to exhaustively identify all content points in it. By comparing the correctly answered questions from two text passages, we will be able to compare their content precisely. The experiment using 2007 DUC summarization corpus clearly shows promising results.

#3 Faithful to the Original: Fact Aware Neural Abstractive Summarization [PDF] [Copy] [Kimi]

Authors: Ziqiang Cao ; Furu Wei ; Wenjie Li ; Sujian Li

Unlike extractive summarization, abstractive summarization has to fuse different parts of the source text, which inclines to create fake facts. Our preliminary study reveals nearly 30% of the outputs from a state-of-the-art neural summarization system suffer from this problem. While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system. To avoid generating fake facts in a summary, we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text. The dual-attention sequence-to-sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions. Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text.

#4 Translating Pro-Drop Languages With Reconstruction Models [PDF] [Copy] [Kimi]

Authors: Longyue Wang ; Zhaopeng Tu ; Shuming Shi ; Tong Zhang ; Yvette Graham ; Qun Liu

Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. With auxiliary training objectives, in the terms of reconstruction scores, the parameters associated with the NMT model are guided to produce enhanced hidden representations that are encouraged as much as possible to embed annotated DP information. Experimental results on both Chinese-English and Japanese-English dialogue translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is directly built on the training data annotated with DPs.

#5 Recognizing and Justifying Text Entailment Through Distributional Navigation on Definition Graphs [PDF] [Copy] [Kimi]

Authors: Vivian Silva ; Siegfried Handschuh ; André Freitas

Text entailment, the task of determining whether a piece of text logically follows from another piece of text, has become an important component for many natural language processing tasks, such as question answering and information retrieval. For entailments requiring world knowledge, most systems still work as a "black box," providing a yes/no answer that doesn't explain the reasoning behind it. We propose an interpretable text entailment approach that, given a structured definition graph, uses a navigation algorithm based on distributional semantic models to find a path in the graph which links text and hypothesis. If such path is found, it is used to provide a human-readable justification explaining why the entailment holds. Experiments show that the proposed approach present results comparable to some well-established entailment algorithms, while also meeting Explainable AI requirements, supplying clear explanations which allow the inference model interpretation.

#6 Sequential Copying Networks [PDF] [Copy] [Kimi]

Authors: Qingyu Zhou ; Nan Yang ; Furu Wei ; Ming Zhou

Copying mechanism shows effectiveness in sequence-to-sequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation. However, existing works on modeling copying or pointing mechanism only considers single word copying from the source sentences. In this paper, we propose a novel copying framework, named Sequential Copying Networks (SeqCopyNet), which not only learns to copy single words, but also copies sequences from the input sentence. It leverages the pointer networks to explicitly select a sub-span from the source side to target side, and integrates this sequential copying mechanism to the generation process in the encoder-decoder paradigm. Experiments on abstractive sentence summarization and question generation tasks show that the proposed SeqCopyNet can copy meaningful spans and outperforms the baseline models.

#7 Learning Sentiment-Specific Word Embedding via Global Sentiment Representation [PDF] [Copy] [Kimi]

Authors: Peng Fu ; Zheng Lin ; Fengcheng Yuan ; Weiping Wang ; Dan Meng

Context-based word embedding learning approaches can model rich semantic and syntactic information. However, it is problematic for sentiment analysis because the words with similar contexts but opposite sentiment polarities, such as good and bad, are mapped into close word vectors in the embedding space. Recently, some sentiment embedding learning methods have been proposed, but most of them are designed to work well on sentence-level texts. Directly applying those models to document-level texts often leads to unsatisfied results. To address this issue, we present a sentiment-specific word embedding learning architecture that utilizes local context informationas well as global sentiment representation. The architecture is applicable for both sentence-level and document-level texts. We take global sentiment representation as a simple average of word embeddings in the text, and use a corruption strategy as a sentiment-dependent regularization. Extensive experiments conducted on several benchmark datasets demonstrate that the proposed architecture outperforms the state-of-the-art methods for sentiment classification.

#8 Improving Sequence-to-Sequence Constituency Parsing [PDF] [Copy] [Kimi]

Authors: Lemao Liu ; Muhua Zhu ; Shuming Shi

Sequence-to-sequence constituency parsing casts the tree structured prediction problem as a general sequential problem by top-down tree linearization,and thus it is very easy to train in parallel with distributed facilities. Despite its success, it relies on a probabilistic attention mechanism for a general purpose, which can not guarantee the selected context to be informative in the specific parsing scenario. Previous work introduced a deterministic attention to select the informative context for sequence-to-sequence parsing, but it is based on the bottom-up linearization even if it was observed that top-down linearization is better than bottom-up linearization for standard sequence-to-sequence constituency parsing. In this paper, we thereby extend the deterministic attention to directly conduct on the top-down tree linearization. Intensive experiments show that our parser delivers substantial improvements over the bottom-up linearization in accuracy, and it achieves 92.3 Fscore on the Penn English Treebank section 23 and 85.4 Fscore on the Penn Chinese Treebank test dataset, without reranking or semi-supervised training.

#9 Knowledge Graph Embedding With Iterative Guidance From Soft Rules [PDF] [Copy] [Kimi]

Authors: Shu Guo ; Quan Wang ; Lihong Wang ; Bin Wang ; Li Guo

Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://github.com/iieir-km/RUGE.

#10 Event Detection via Gated Multilingual Attention Mechanism [PDF] [Copy] [Kimi]

Authors: Jian Liu ; Yubo Chen ; Kang Liu ; Jun Zhao

Identifying event instance in text plays a critical role in building NLP applications such as Information Extraction (IE) system. However, most existing methods for this task focus only on monolingual clues of a specific language and ignore the massive information provided by other languages. Data scarcity and monolingual ambiguity hinder the performance of these monolingual approaches. In this paper, we propose a novel multilingual approach---dubbed as Gated Multilingual Attention (GMLATT) framework---to address the two issues simultaneously. In specific, to alleviate data scarcity problem, we exploit the consistent information in multilingual data via context attention mechanism. Which takes advantage of the consistent evidence in multilingual data other than learning only from monolingual data. To deal with monolingual ambiguity problem, we propose gated cross-lingual attention to exploit the complement information conveyed by multilingual data, which is helpful for the disambiguation. The cross-lingual attention gate serves as a sentinel modelling the confidence of the clues provided by other languages and controls the information integration of various languages. We have conducted extensive experiments on the ACE 2005 benchmark. Experimental results show that our approach significantly outperforms state-of-the-art methods.

#11 Never Retreat, Never Retract: Argumentation Analysis for Political Speeches [PDF] [Copy] [Kimi]

Authors: Stefano Menini ; Elena Cabrio ; Sara Tonelli ; Serena Villata

In this work, we apply argumentation mining techniques, in particular relation prediction, to study political speeches in monological form, where there is no direct interaction between opponents. We argue that this kind of technique can effectively support researchers in history, social and political sciences, which must deal with an increasing amount of data in digital form and need ways to automatically extract and analyse argumentation patterns. We test and discuss our approach based on the analysis of documents issued by R. Nixon and J. F. Kennedy during 1960 presidential campaign. We rely on a supervised classifier to predict argument relations (i.e., support and attack), obtaining an accuracy of 0.72 on a dataset of 1,462 argument pairs. The application of argument mining to such data allows not only to highlight the main points of agreement and disagreement between the candidates' arguments over the campaign issues such as Cuba, disarmament and health-care, but also an in-depth argumentative analysis of the respective viewpoints on these topics.

#12 280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification [PDF] [Copy] [Kimi]

Authors: Amit Gupta ; Rémi Lebret ; Hamza Harkous ; Karl Aberer

We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the isa relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.

#13 AMR Parsing With Cache Transition Systems [PDF] [Copy] [Kimi]

Authors: Xiaochang Peng ; Daniel Gildea ; Giorgio Satta

In this paper, we present a transition system that generalizes transition-based dependency parsing techniques to generateAMR graphs rather than tree structures. In addition to a buffer and a stack, we use a fixed-size cache, and allow the system to build arcs to any vertices present in the cache at the same time. The size of the cache provides a parameter that can trade off between the complexity of the graphs that can be built and the ease of predicting actions during parsing. Our results show that a cache transition system can cover almost all AMR graphs with a small cache size, and our end-to-end system achieves competitive results in comparison with other transition-based approaches for AMR parsing.

#14 Augmenting End-to-End Dialogue Systems With Commonsense Knowledge [PDF] [Copy] [Kimi]

Authors: Tom Young ; Erik Cambria ; Iti Chaturvedi ; Hao Zhou ; Subham Biswas ; Minlie Huang

Building dialogue systems that can converse naturally with humans is a challenging yet intriguing problem of artificial intelligence. In open-domain human-computer conversation, where the conversational agent is expected to respond to human utterances in an interesting and engaging way, commonsense knowledge has to be integrated into the model effectively. In this paper, we investigate the impact of providing commonsense knowledge about the concepts covered in the dialogue. Our model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models. In the retrieval-based scenario, we propose a model to jointly take into account message content and related commonsense for selecting an appropriate response. Our experiments suggest that the knowledge-augmented models are superior to their knowledge-free counterparts.

#15 Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning With Confidence [PDF] [Copy] [Kimi]

Authors: Ruobing Xie ; Zhiyuan Liu ; Fen Lin ; Leyu Lin

Knowledge graphs (KGs), which could provide essential relational information between entities, have been widely utilized in various knowledge-driven applications. Since the overall human knowledge is innumerable that still grows explosively and changes frequently, knowledge construction and update inevitably involve automatic mechanisms with less human supervision, which usually bring in plenty of noises and conflicts to KGs. However, most conventional knowledge representation learning methods assume that all triple facts in existing KGs share the same significance without any noises. To address this problem, we propose a novel confidence-aware knowledge representation learning framework (CKRL), which detects possible noises in KGs while learning knowledge representations with confidence simultaneously. Specifically, we introduce the triple confidence to conventional translation-based methods for knowledge representation learning. To make triple confidence more flexible and universal, we only utilize the internal structural information in KGs, and propose three kinds of triple confidences considering both local and global structural information. In experiments, We evaluate our models on knowledge graph noise detection, knowledge graph completion and triple classification. Experimental results demonstrate that our confidence-aware models achieve significant and consistent improvements on all tasks, which confirms the capability of CKRL modeling confidence with structural information in both KG noise detection and knowledge representation learning.

#16 Table-to-Text Generation by Structure-Aware Seq2seq Learning [PDF] [Copy] [Kimi]

Authors: Tianyu Liu ; Kexiang Wang ; Lei Sha ; Baobao Chang ; Zhifang Sui

Table-to-text generation aims to generate a description for a factual table which can be viewed as a set of field-value records. To encode both the content and the structure of a table, we propose a novel structure-aware seq2seq architecture which consists of field-gating encoder and description generator with dual attention. In the encoding phase, we update the cell memory of the LSTM unit by a field gate and its corresponding field value in order to incorporate field information into table representation. In the decoding phase, dual attention mechanism which contains word level attention and field level attention is proposed to model the semantic relevance between the generated description and the table. We conduct experiments on the WIKIBIO dataset which contains over 700k biographies and corresponding infoboxes from Wikipedia. The attention visualizations and case studies show that our model is capable of generating coherent and informative descriptions based on the comprehensive understanding of both the content and the structure of a table. Automatic evaluations also show our model outperforms the baselines by a great margin. Code for this work is available on https://github.com/tyliupku/wiki2bio.

#17 An Unsupervised Model With Attention Autoencoders for Question Retrieval [PDF] [Copy] [Kimi]

Authors: Minghua Zhang ; Yunfang Wu

Question retrieval is a crucial subtask for community question answering. Previous research focus on supervised models which depend heavily on training data and manual feature engineering. In this paper, we propose a novel unsupervised framework, namely reduced attentive matching network (RAMN), to compute semantic matching between two questions. Our RAMN integrates together the deep semantic representations, the shallow lexical mismatching information and the initial rank produced by an external search engine. For the first time, we propose attention autoencoders to generate semantic representations of questions. In addition, we employ lexical mismatching to capture surface matching between two questions, which is derived from the importance of each word in a question. We conduct experiments on the open CQA datasets of SemEval-2016 and SemEval-2017. The experimental results show that our unsupervised model obtains comparable performance with the state-of-the-art supervised methods in SemEval-2016 Task 3, and outperforms the best system in SemEval-2017 Task 3 by a wide margin.

#18 Neural Knowledge Acquisition via Mutual Attention Between Knowledge Graph and Text [PDF] [Copy] [Kimi]

Authors: Xu Han ; Zhiyuan Liu ; Maosong Sun

We propose a general joint representation learning framework for knowledge acquisition (KA) on two tasks, knowledge graph completion (KGC) and relation extraction (RE) from text. In this framework, we learn representations of knowledge graphs (KGs) and text within a unified parameter sharing semantic space. To achieve better fusion, we propose an effective mutual attention between KGs and text. The reciprocal attention mechanism enables us to highlight important features and perform better KGC and RE. Different from conventional joint models, no complicated linguistic analysis or strict alignments between KGs and text are required to train our models. Experiments on relation extraction and entity link prediction show that models trained under our joint framework are significantly improved in comparison with other baselines. Most existing methods for KGC and RE can be easily integrated into our framework due to its flexible architectures. The source code of this paper can be obtained from https://github.com/thunlp/JointNRE.

#19 Deep Semantic Role Labeling With Self-Attention [PDF] [Copy] [Kimi]

Authors: Zhixing Tan ; Mingxuan Wang ; Jun Xie ; Yidong Chen ; Xiaodong Shi

Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F1=83.4 on the CoNLL-2005 shared task dataset and F1=82.7 on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by 1.8 and 1.0 F1 score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU.

#20 Multi-Channel Encoder for Neural Machine Translation [PDF] [Copy] [Kimi]

Authors: Hao Xiong ; Zhongjun He ; Xiaoguang Hu ; Hua Wu

Attention-based Encoder-Decoder has the effective architecture for neural machine translation (NMT), which typically relies on recurrent neural networks (RNN) to build the blocks that will be lately called by attentive reader during the decoding process. This design of encoder yields relatively uniform composition on source sentence, despite the gating mechanism employed in encoding RNN. On the other hand, we often hope the decoder to take pieces of source sentence at varying levels suiting its own linguistic structure: for example, we may want to take the entity name in its raw form while taking an idiom as a perfectly composed unit. Motivated by this demand, we propose Multi-channel Encoder (MCE), which enhances encoding components with different levels of composition. More specifically, in addition to the hidden state of encoding RNN, MCE takes 1) the original word embedding for raw encoding with no composition, and 2) a particular design of external memory in Neural Turing Machine NTM) for more complex composition, while all three encoding strategies are properly blended during decoding. Empirical study on Chinese-English translation shows that our model can improve by 6.52 BLEU points upon a strong open source NMT system: DL4MT1. On the WMT14 English-French task, our single shallow system achieves BLEU=38.8, comparable with the state-of-the-art deep models.

#21 Early Syntactic Bootstrapping in an Incremental Memory-Limited Word Learner [PDF] [Copy] [Kimi]

Authors: Sepideh Sadeghi ; Matthias Scheutz

It has been suggested that early human word learning occurs across learning situations and is bootstrapped by syntactic regularities such as word order. Simulation results from ideal learners and models assuming prior access to structured syn-tactic and semantic representations suggest that it is possible to jointly acquire word order and meanings and that learning is improved as each language capability bootstraps the other.We first present a probabilistic framework for early syntactic bootstrapping in the absence of advanced structured representations, then we use our framework to study the utility of joint acquisition of word order and word referent and its onset, in a memory-limited incremental model. Comparing learning results in the presence and absence of joint acquisition of word order in different ambiguous contexts, improvement in word order results showed an immediate onset, starting in early trials while being affected by context ambiguity. Improvement in word learning results on the other hand, was hindered in early trials where the acquired word order was imperfect,while being facilitated by word order learning in future trials as the acquired word order improved. Furthermore, our results showed that joint acquisition of word order and word referent facilitates one-shot learning of new words as well as inferring intentions of the speaker in ambiguous contexts.

#22 Actionable Email Intent Modeling With Reparametrized RNNs [PDF] [Copy] [Kimi]

Authors: Chu-Cheng Lin ; Dongyeop Kang ; Michael Gamon ; Patrick Pantel

Emails in the workplace are often intentional calls to action for its recipients. We propose to annotate these emails for what action its recipient will take. We argue that our approach of action-based annotation is more scalable and theory-agnostic than traditional speech-act-based email intent annotation, while still carrying important semantic and pragmatic information. We show that our action-based annotation scheme achieves good inter-annotator agreement. We also show that we can leverage threaded messages from other domains, which exhibit comparable intents in their conversation, with domain adaptive RAINBOW (Recurrently AttentIve Neural Bag-Of-Words). On a collection of datasets consisting of IRC, Reddit, and email, our reparametrized RNNs outperform common multitask/multidomain approaches on several speech act related tasks. We also experiment with a minimally supervised scenario of email recipient action classification, and find the reparametrized RNNs learn a useful representation.

#23 Event Representations With Tensor-Based Compositions [PDF] [Copy] [Kimi]

Authors: Noah Weber ; Niranjan Balasubramanian ; Nathanael Chambers

Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event-level and scenario-level semantics. We propose a new tensor-based composition method for creating event representations. The method captures more subtle semantic interactions between an event and its entities and yields representations that are effective at multiple event-related tasks. With the continuous representations, we also devise a simple schema generation method which produces better schemas compared to a prior discrete representation based method. Our analysis shows that the tensors capture distinct usages of a predicate even when there are only subtle differences in their surface realizations.

#24 Linguistic Properties Matter for Implicit Discourse Relation Recognition: Combining Semantic Interaction, Topic Continuity and Attribution [PDF] [Copy] [Kimi]

Authors: Wenqiang Lei ; Yuanxin Xiang ; Yuwei Wang ; Qian Zhong ; Meichun Liu ; Min-Yen Kan

Modern solutions for implicit discourse relation recognition largely build universal models to classify all of the different types of discourse relations. In contrast to such learning models, we build our model from first principles, analyzing the linguistic properties of the individual top-level Penn Discourse Treebank (PDTB) styled implicit discourse relations: Comparison, Contingency and Expansion. We find semantic characteristics of each relation type and two cohesion devices---topic continuity and attribution---work together to contribute such linguistic properties. We encode those properties as complex features and feed them into a NaiveBayes classifier, bettering baselines(including deep neural network ones) to achieve a new state-of-the-art performance level. Over a strong, feature-based baseline, our system outperforms one-versus-other binary classification by 4.83% for Comparison relation, 3.94% for Contingency and 2.22% for four-way classification.

#25 Effective Broad-Coverage Deep Parsing [PDF] [Copy] [Kimi]

Authors: James Allen ; Omid Bahkshandeh ; William de Beaumont ; Lucian Galescu ; Choh Man Teng

Current semantic parsers either compute shallow representations over a wide range of input, or deeper representations in very limited domains. We describe a system that provides broad-coverage, deep semantic parsing designed to work in any domain using a core domain-general lexicon, ontology and grammar. This paper discusses how this core system can be customized for a particularly challenging domain, namely reading research papers in biology. We evaluate these customizations with some ablation experiments